home *** CD-ROM | disk | FTP | other *** search
- Subject: Re: Z80 emulator to learn assembly?
- From: Paul Urbanus <urb@onramp.net>
- Date: 1997/01/31
- Message-ID: <32F28691.2ADC@onramp.net>
- References: <32edd486.3490767@news.airmail.net> <5cgr4r$nqb@hecate.umd.edu> <5cipc6$ra9@dinkel.civ.utwente.nl>
- Content-Type: text/plain; charset=us-Ascii
- Organization: OnRamp Technologies; ISP; Dallas/Ft Worth/Houston, TX USA
- Mime-Version: 1.0
- Newsgroups: comp.emulators.misc,comp.os.cpm
- X-Mailer: Mozilla 3.01 (Win16; I)
-
-
-
- Marcel de Kogel wrote:
- >
- > On 26 Jan 1997 23:59:23 GMT, marat@Glue.umd.edu (Marat Fayzullin)
- > wrote:
- >
- > >Rogers Cadenhead (rcade@airmail.net) wrote:
- > >: I am learning Z80 assembly language programming so that I can write
- > >: some new Colecovision games and figure out how some of my old
- > >: favorites were written.
- > >
- > >: What's the best Z80 emulator I can find for DOS or Win95 that I can
- > >: use to run the programs I'm writing? I've read that CPM emulators are
- > >: the best choice.
- > >As you are going to write Colecovision programs, a crossassembler+ColEm
- > >combination will probably be the best. Also, check AdamEm, the Coleco
- > >Adam emulator by Marcel de Kogel.
- > >
- > >Marat
- >
- > I'm working on some as well, and while writing them I found some very
- > interesting features in the VDP design (e.g. there's no such thing as
- > truly seperate read and write addresses) I didn't find described
- > anywhere. While I've implemented most in ADAMEm, I didn't find out how
- > some of this really works (e.g. I get mixed results when reading VRAM
- > after setting a new write address). This is why I prefer using Mission
- > and an MSX for testing purposes; In fact, it's why I wrote Mission in
- > the first place. Of course, final testing is done on the CV itself,
- > and you'll need an MSX to run Mission natively
- >
- > Marcel
-
- Marcel,
-
- The VDP (Video Display Processor) chip used in the Colecovision and in
- the early (MSX-1?) systems was the Texas Instruments TMS9918A, which was
- also used in the TI99/4A Home Computer. This machine came onto the
- market in 1980, in the midst of the video game/home computer boom.
- During that time, I worked for TI as a student, and in 1982 I
- co-authored (along with Jim Dramis) a game for the 99/4A called PARSEC,
- among other things. All of us game programmers always lamented the fact
- that the VDP memory was 'indirectly' mapped instead of direct, which of
- course limited the amount of raw bit pushing we could do. Anyway, I
- think the following will (hopefully) clear up your confusion regarding
- accessing VDP memory. Note that later versions of the MSX systems
- (MSX-2?) used a superset of the TI9918A, the YM9938, which was made by
- Yamaha. The following discussion applies only to the TI9918A VDP chip.
-
- You are correct when you stated that there is only ONE memory address
- register in the VDP, and this is used for both reading and writing data.
- Thus, there must be a way to indicate to the VDP whether the address
- which has been written is to be used for reading or writing data. This
- is done by using one of the upper address bits in the 16 bit address.
-
- Since the 9918A can only address 16k bytes of memory, the upper two bits
- in the address (A14-A15) will always be zero. While bit 15 (the most
- significant) is always set to zero, bit 14 is used to distinguish
- between a read and a write address. The following shows how this bit
- affects subsequent VRAM data accesses.
-
- VRAM address bit 14 | VRAM data access function
- ------------------------------------------------------------------------
- 0 | VRAM address specifies location to read
- | (initiate the read/increment the address counter)
- ------------------------------------------------------------------------
- 1 | VRAM address specifies location to write
- | (wait for data write to VDP before actual write
- | to VRAM, then increment address counter)
-
- If you are simply calculating the address for writing data, then using
- that address as the write address without setting bit 14=1, this might
- cause some unexpected behavior. If bit 14=0, this will cause the VDP to
- initiate a read cycle and then increment the address counter, thus
- giving the impression that the "write address" has been set to
- (address+1). As I stated before, there really is only one address
- register, so when you perform data reads/writes you are affecting the
- same register.
-
- I'm sure that the VDP designers (at TI, anyway) didn't expect people to
- interleave data reads and writes without resetting the address, so any
- undocumented operation may or may not be supported on all revisions of
- the chip. As will any 'undocumented bugs/features', I'd be concerned
- about the implementation of these 'features' in 9918A clones, such as
- the Yamaha 9938.
-
- Another thing you should be aware of is the timing constraints placed on
- address and data accesses to the VDP RAM. Actual reading/writing of the
- VDP RAM (VRAM) by the CPU can only occur when the VDP is not reading the
- memory for the purpose of generating the screen image. In some display
- modes, most of the memory bandwidth is utilized for generating the
- image, leaving little time (unfortunately) for the CPU to access memory.
- The worst case scenario is in graphics modes I,II, where the VDP uses
- almost all of the memory bandwidth to generate the screen image. In this
- mode, only 1 memory access out of 16 is designated for the CPU - the
- rest are allocated for screen refresh.
-
- According to the 9918A (VDP) Data Manual, there are two timing
- constraints to be followed when access VRAM.
-
- 1. After the second address byte (MSByte) has been written to the VDP,
- there must be a 2 microsecond wait before any data read/write accesses
- can occur. This constraint ALWAYS applies, no matter which display mode
- is in effect or which part of the screen (active video, vertical
- sync/blanking) is being displayed. In the table below, this is referred
- to as 'VDP Delay'.
-
- 2. The second timing constraint depends on which display mode is active
- in the VDP, and which part of the screen (active video, vertical
- sync/blanking) is being displayed. The following table shows these
- timing constraints. In the table, this second delay constraint is
- referred to as 'Time waiting for an access window'.
-
- | | VDP | Time waiting for | Total
- Condition | Mode | Delay | an access window | time
- ------------------------------------------------------------------------
- Active Display Area | Text | 2 us | 0 - 1.1 us | 2 - 3.1 us
- ------------------------------------------------------------------------
- Active Display Area | Graphics | 2 us | 0 - 5.95 us | 2 - 8 us
- | I,II | | |
- ------------------------------------------------------------------------
- 4300 us after | All | 2 us | 0 us | 2 us
- Vertical Interrupt | | | |
- ------------------------------------------------------------------------
- Register 1, bit 1=0 | All | 2 us | 0 us | 2 us
- (display is blanked)| | | |
- ------------------------------------------------------------------------
- Active Display Area | Multicolor | 2 us | 0 - 1.5 us | 2 - 3.5 us
- ------------------------------------------------------------------------
-
- Examination of the above access window table yields the following
- observations.
-
- 1. Always try to do massive VRAM moves during the vertical retrace
- period, since that is when max memory bandwidth is available to the CPU,
- theoretically 500 Kbytes/sec. This is especially important in Graphics
- modes I & II, which will be used for almost ALL games. Theoretically,
- one can move (4300 us/2 us) 2150 bytes to/from the VRAM in one vertical
- blanking time.
-
- 2. If you need to move lots of data, such as completely changing
- screens, set the blanking bit it VDP register 1 to 0, then read/write
- the data.
-
-
- WHY DOES THE VDP NEED SO MUCH BANDWIDTH TO REFRESH THE SCREEN,
- AND OTHER STUFF YOU REALLY DON'T NEED TO KNOW ABOUT THE 9918A?
- --------------------------------------------------------------
-
- The following is provided as additional background information, and may
- be considered excess, but I give for it those who might want to
- understand how the bandwidth is used in Graphics modes I & II.
-
- First, consider the overriding considerations for the guys who did the
- 9918A chip design. Of course, the part must function, but more
- importantly, the die size must be as small as possible to keep the cost
- down. After all, this chip was targeted toward a consumer market.
-
- Now, a little background on VDP memory and pixel timing. The master
- clock for the VDP is the color burst frequency X 3. All subsequent
- calculations are for the NTSC version of the part, although the PAL
- numbers will be similar. The color burst frequency is 3.579545 MHz.
- While I don't know pi to this many digits, the color burst frequency is
- very handy to know when working with NTSC video. So, the master clock
- frequency is given by
-
- Fmaster = Fcolorburst * 3
- = 3.579545 MHz * 3
- = 10.7386 MHz
-
- The period of the master clock is given by
-
- Tmaster = 1/Fmaster
- = 1/10.7386 MHz
- = 93.12 ns (nanoseconds)
-
- Each memory access takes four master clock times, so the memory access
- time is given by
-
- Tmem = Tmaster * 4
- = 93.12 ns * 4
- = 372.5 ns = 0.3725 us
-
- The horizontal line time, or the amount of time from the start of one
- horizontal display line to the next horizontal display line is specified
- in the data sheet as Thorz = 63.695 us. So, the total number of times
- which VDP memory can be accessed in a single horizontal scan line is
- given by
-
- Mhorz = Max number of memory accesses in a horizontal line
- = Thorz/Tmem
- = 63.695 us/0.3725 us
- = 171 memory access per horizontal line, max
-
- We now know how many memory accesses are available to be allocated for
- display refresh and CPU accesses combined.
-
- Next, let's find out how many memory accesses are requied to build up a
- single horizontal scan line in Graphics modes I or II. Any unused
- accesses can theoretically be allocated to the CPU.
-
- Any one active scan line is composed of up to six layers of graphic data
- (listed in back to front hierarchy):
-
- 1. Background color (from VDP register #7)
- 2. Character pattern/color info
- 3. Sprites (min number=0, max number=4)
- NOTE: there may never be more than 4 sprites on a horizontal scan
- line
-
- Let's see how many memory accesses are required to get the data for the
- three different 'planes' described above.
-
- First, the background color requires zero memory accesses, as it is held
- the lower 4 bits of VDP register 7.
-
- Next, is the character data. There are 32 characters per scan line, and
- each character in the scan line requires the following memory accesses
- to retrieve the data required to generate the pixel data for that
- character.
-
- 1. Read character number from Pattern Name Table (PNT)
- 2. Read character bitmap data from Pattern Generator Table (PGT)
- 3. Read character color info from Pattern Color Table (PCT)
-
- As you can see, it takes three memory accesses for each character, and
- so the total number of memory accesses required per scan line to build
- up the character display plane is given by
-
- Mchar = 32 characters/scan line X 3 mem accesses/character
- = 96 memory accesses per scan line for character plane
-
- Finally, the sprite planes must be processed. The 9918A allows up to
- four sprites (out of 32) to be displayed on a scan line, and sprite #0
- has the highest priority - that is, it will be the frontmost.
-
- To determine which sprite will be visible on any given scan line, the
- Y-position of all 32 sprites must be read from the Sprite Attribute
- Table (SAT) in VRAM and compared against the current scan line number.
- When doing the compare, the Mag bit from VDP register 1 must be taken
- into account, since the magnification is in both the x and y directions.
-
- If the Y-location of the sprite is such that it is to be displayed on
- this scan line, then the sprite number (0-32) is placed in one of four
- temporary holding registers (SR0-SR3), if all four registers are not
- already filled. SR0 fills first, SR3 fills last, and SR0 specifies the
- frontmost sprite plane and SR3 specifies the rearmost sprite plane.
- While the Y-locations of these active sprites may be saved inside the
- VDP, I suspect they are not. Keeping these Y-locations would require 4
- extra holding registers, which can be eliminated by refetching the
- Y-locations later, albeit at the 'cost' of more memory access. However,
- 4 registers affects the chip die size, but it is not clear that the VDP
- user even knows about the 'cost' of these extra memory cycles.
-
- In the worst case, the first 28 sprites, 0-27, are not displayed on a
- given scan line, but sprites 28-31 will be displayed. In this case, the
- Y-location all 32 sprites may have to be read. For the purposes of
- memory access calculations, we must assume that all 32 sprite
- Y-locations will have to be read. Therefore, we define the number of
- memory accesses required to test which sprites should be displayed on a
- given scan line as
-
- Msprite_test = 32 memory cycles (1 Y-location per sprite)
-
- After it is determined which sprites need to be displayed, the data for
- the four sprites (again, worst case) to be displayed must be fetched.
- For each sprite, there are 4 bytes (Y-location, X-location, pattern
- number, color/early clock) which need to be fetched from the Sprite
- Attribute Table. When the Size bit in VDP Register is set to 1,
- indicating double size sprites, two bytes of sprite pattern data must be
- read from the Sprite Pattern Generator Table. Again, this is the worst
- case. Therefore, six memory cycles are required for each sprites which
- is to be displayed. So, we now define the maximum number of memory
- cycles required to fetch the data needed to display four sprites on a
- scan line as
-
- Msprite_data = 4 sprites line x 6 memory cycles/sprite
- = 24 memory cycles
-
- Now, let's summarize the maximum total number of accesses required for
- displaying four sprites on a scan line as
-
- Msprite = total number of memory accesses per scan line for sprite
- display
- = test which sprites are on this line + sprite display data
- access
- = Msprite_test + Msprite_data
- = 32 + 24
- = 56 memory cycles
-
- Whew! Finally, we can calculate the number of memory cycles used to
- refresh one active scan line in the display. This is given by
-
- Mdisplay = Mchar + Msprite
- = 96 + 56
- = 152
-
- For those of you who I have not totally confused, the end is now in
- sight. We are ready to compute the number of memory cycles available to
- the CPU.
-
- Drum roll, please!
-
- Mcpu = Mem accesses in one horizontal scan line - display mem accesses
- = Mhorz - Mdisplay
- = 171 - 152
- = 19 memory accesses available for the CPU
-
- If the CPU can access the memory every 5.95 us, then the total number of
- CPU accesses allowed in a horizontal line time is given by
-
- Mcpu_horz = horizontal line time/memory access time
- = 63.695 us/5.95 us
- = 10.7 CPU memory access per horizontal scan line
-
- If one rounds 10.7 up to 11, that would seem to indicate that there are
- 8 memory cycles (19-11=8) which are unused.
-
- Perhaps those extra 8 cycles could have been used to allow a fifth
- sprite on a line, since each sprite costs a total of seven memory cycles
- (1 for y-test, then 6 more if displayed). However, that would only leave
- one memory cycle to spare. Also, there are scheduling and sycronization
- issues involved regarding the sprites, and it would have probably
- required too much chip area to squeeze in that one extra sprite.
-
- Or, maybe those 19 cycles could have been all allocated to CPU accesses.
- However, remember the earlier statement that every 16th memory accesses
- is allocated for the CPU. It is relatively simple (cheap) to decode this
- CPU access slot from the horizontal counter inside of the VDP which is
- used for overall horizontal timing. If, instead, we take the 19 cycles
- and divide them into the horizontal line time, we get
-
- Taccess_best = 63.695 us/19 memory cyles
- = 3.35 us between CPU data accesses
-
- If memory cyclces take 372.5 ns, then the CPU could have every ninth
- memory cycle (3.35/0.3725). Since this is not an integral power of two,
- a separate CPU access counter would be required and would take more chip
- area (cost) than a simple decode of the lower four bits of the
- horizontal counter.
-
- To summarize, the sprites take up slightly more than 1/3 of the display
- bandwidth. Unfortunately, the chip designers did not incude a way to
- turn off the sprites and thus allow 1 of four memory accesses to be
- allocated to the CPU.
-
- I hope this information will be useful or educational to someone out
- there. Maybe you now have a better understanding of how the video
- hardware in the Colecovision works.
-
- Paul Urbanus
- urb@urbonix.com
-
-
- P.S. In anticipation of doing some work for the Colecovision, I built a
- single-board computer (SBC) that used the TI processor. This SBC
- attached to the expansion port of the Colecovision and used DMA (Direct
- Memory Access) to access the hardware. Since we already had a debugger
- written for the TI99/4A, I modified it for the SBC so we could learn how
- to access the Colecovision hardware. HERE'S THE PERVERSE PART - since we
- didn't know any Z-80 assembly, we wanted to examine some of the code in
- Coleco games. So we wrote a symbolic Z80 dissassembler IN TI9900
- ASSEMBLY LANGUAGE. What were we thinking???
-
-
- *** ***
- * Paul Urbanus urb@urbonix.com *
- * *
- * Never wrestle with a hog - you get dirty and the hog likes it. *
- *** ***
-